Reserve the first level headings (#) for the start of a new Module. This will help to organize your portfolio in an intuitive fashion.
Note: Please edit this template to your heart’s content. This is meant to be the armature upon which you build your individual portfolio. You do not need to keep this instructive text in your final portfolio, although you do need to keep module and assignment names so we can identify what is what.
Third level headers (###) should be used for links to assignments, evidence worksheets, problem sets, and readings, as seen here.
Use this space to include your installation screenshots.
Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.
In Git: mkdir MICB425_portfolio
cd MICB425_portfolio
cd MICB425_portfolio
Create repository on GitHub page.
git init
git add .
git commit -m “First commit”
git remote add origin https://remote_repository_URL
git remote -v
git push -u origin master
Paste your code from the in-class activity of recreating the example html.
The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.
http://phdcomics.com/ Comic posted 1-17-2018
The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find the exact image, just find a comparable from the PhD Comics website and include it in your markdown
Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).
Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear!A calculator R is here:
1231521+12341556280987
## [1] 1.234156e+13
Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in thefuture.
library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
| speed | dist | |
|---|---|---|
| Min. : 4.0 | Min. : 2.00 | |
| 1st Qu.:12.0 | 1st Qu.: 26.00 | |
| Median :15.0 | Median : 36.00 | |
| Mean :15.4 | Mean : 42.98 | |
| 3rd Qu.:19.0 | 3rd Qu.: 56.00 | |
| Max. :25.0 | Max. :120.00 |
And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh?
Here’s ours! Include a fun gif of your choice!
Silicon Valley
The template for the first Evidence Worksheet has been included here. The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).
You can copy-paste in the answers you recorded when working through the evidence worksheet into this portfolio template.
As you include Evidence worksheets and Problem sets in the future, ensure that you delineate Questions/Learning Objectives/etc. by using headers that are 4th level and greater. This will still create header markings when you render (knit) the document, but will exclude these levels from the Table of Contents. That’s a good thing. You don’t’ want to clutter the Table of Contents too much.
Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.
What were the main questions being asked?
What is the abundance of prokaryotes on earth? What is the total amount of cellular carbon produced by these prokaryotes on earth?
Other Habitats: - animals - human: cell density of prokaryotes on the skin multiply by skin surface area - insects like termite by counting number of insect and number of prokaryotes in said insect - leaves: can be estimated by assuming a dense population and high leaf area index
- air: pre-calculated Carbon Content: - estimated from cell numbers in soil, aquatic systems, and the subsurface - cellular carbon is assumed to be one-half of dry weight for soil and subsurface - take average dry weight of prokaryotic cells multiple by number of cell - aquatic systems: assumed that average cellular carbon for sedimentary and planktonic prokaryotes to be 10 and 20 fg of C/cell respectively then multiple that with number of cells in aquatic systems
Comment on the emergence of microbial life and the evolution of Earth systems
Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.
4.6 billion years ago
Formation of Earth
4.5 billion years ago
Moon was formed to give Earth spin & tilt, day & night cycles, seasons
4.4 billion years ago
oldest mineral found (zircon)
4.1 billion years ago
earliest evidence of life in zircon
3.8 billion years ago
meteor bombardment stops
Sedimentary rocks: weathering, ocean
carbon isotopes also in graphite
iron rich sedimentary rocks
3.5 billion years ago Photosynthesis: ambigious microfossils
stromatolites (organosedimentary structures produced by microbial trappings, usually but not always photosynthetic)
3.0 billion years ago Glaciation: Earth would have appeared brown
2.2 billion years ago oxygen levels increased sharply
rock recognized as redbeds -> evidence for oxidation
2.1 billion years ago end of Snowball Earth
1.9 billion years ago Eukaryote emergence
550 million years ago Cambrian explosion
400 million years ago emergence of land plants
200,000 years ago
H. Sapiens appear
Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:
Hadean extremely hot >100oC ocean temperature
seawater chemistry controlled by volcanism
Archean
methanogenesis (early); Greenhouse effect because of CH4 and CO2
Precambrian
reducing atmosphere
glaciation ended as greenhouse effec was enhanced by volcanoes
CO2 levels hundrends times higher than now
Proterozoic
Snowball Earth
accumulation of oxygen in the Earth’s atmosphere
filling of chemical sinks and increase carbon burtial
nitrogen concentration close to modern levels
Phanerozoic carboniferous period
four separate glaciation periods
higher oxygen levels
Evaluate human impacts on the ecology and biogechemistry of Earth systems.
What are the consequences of crossing certain biophysical thresholds? - Three of nine interlinked planetary boundaries have already been overstepped?
thresholds can be defined by a critical value for one or more control variables (eg. [CO2])
No more than 11 million tonnes of phosphorus per year should be dumped ino the ocean.
what are some preliminary measures we can consider to reach these goals?
Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.
What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.
a. Aquatic : 1.18 x 10^29^
b. Soil: 2.556 x 10^29^
What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth? 3.6 x 1028 cyanobacteria: 4x 104 cells/ml / 5 x 105 cells x 100 = 8%
Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?
Since the temperature drop is 22 degrees drop per km so the deepest part that can support life is Mariana Trench 10.9km + plus an extra 5 km
Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
22 km on top of the 8.8 km on Mt. Everest. A limiting factor at that height would be obtaining enough nutrients.
Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?
-22 + 8.8 + 10.9 + 5 = 46.7 km
How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
- 3.6 x 1028 / 16 x 365 = 8.4 x 1029 - population size divided by turnover time per day times 365 days
Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?
Prokaryotes would have high genetic diversity and the ability to adapt quickly dude to their high mutation rate. Insertions and deletions are generally detrimental to a gene’s function since they shift the reading frame so point mutations tend to be the most common, but there’s potential for these type of mutations to promote genetic diversity.
High prokaryotic abundance encourages the diversification of metabolic capabilities in prokaryotes. There are more likely to be more mutations taking place in a larger population of prokaryotes that allow them to fully take advantage of their environment and compete for different resources.
Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.
nitrous oxide is a potent greenhosue gass that contributes to global warming
The microbiology community’s general consensus is that humans would not be able to live without microbes. Falkowski et al.(1) commented that “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.”. It may be bold to assume they are necessary for our survival, but their existence is essential to our current lifestyle. Microbial networks facilitate the biogeochemical processes that cycle our nutrients and maintain a livable atmosphere. They are difficult to replicate because of their complexity and scale, and efforts to emulate them have resulted in environmental damage. Furthermore, their resilience makes them valuable assets in our fight against climate change.
Microbes form metabolic networks that facilitate the biogeochemical processes which fix and cycle our nutrients. Carbon and nitrogen are necessary for the production of biological building blocks that make up our body (2), but they cannot be utilized as our nutrients unless they are either converted from its inorganic form or reduced. Nitrogen can only be incorporated into biological molecules through nitrogen fixation, where nitrogen gas (N2) is reduced to ammonium. Microbes are the only organisms that can accomplish this biotically, since their genes encode the enzyme nitrogenase—a heterodimeric complex that breaks apart the N≡N bond of N2 (1). Similarly, microbes are necessary for the movement of carbon between sinks. There are three times as many global organic carbon stocks stored in soil as the amount of inorganic carbon stored in the atmosphere as CO2 (3). If microbial respiration were to cease, current primary production would deplete atmospheric CO2 stocks in 12 years (4) and dramatically decrease the rate of photosynthesis in our crops.
We currently do not have the technological capacity to replace these metabolic networks due to their complexity and scale. Metabolic networks consist of individual redox reactions that are carried out by different macromolecular complexes that are encoded by many genes or housed in different microbial groups. In oxygenic photosynthesis, 100 genes alone are needed to encode the molecular complexes required for energy transduction (6). To further complicate matters, some pathways in biogeochemical cycles are catalyzed by diverse multispecies microbial interactions. In the nitrogen cycle, NH4+ is first oxidized to NO2- by a group of Bacteria or Archaea then a different group of nitrifying oxidizing bacteria oxidizes NO2- to NO3- (7). The scale of these reactions is another challenging aspect we would need to overcome. There are approximately 4-6 x 1030 prokaryotes on earth in total (8) and these numbers do not include eukaryotic microorganisms. The sheer abundance of these microorganisms demonstrates that these microbial metabolic networks exist at a large scale that we may never be able to reconstruct entirely.
Our attempts to emulate some of these metabolic networks have been damaging for the environment and further highlights our limitations. Humans have acquired the ability to fix nitrogen inorganically through fossil fuel combustion, almost doubling the rate of terrestrial nitrogen fixation. The excess NH4+ produced industrially is converted to NO3- , which leaches into water reserves and creates anoxic zones. This lead to a rise in atmospheric N2O—a greenhouse gas that has 300 times global warming potential of CO2. These environmental damages are a testament of our inability to construct an elegant biochemical network like microbes. Until we can balance the inputs of our activities with an output that does not alter the climate, we will need to rely on the adaptive capabilities of microbes to produce a new steady state for the biosphere.
Microbes are invaluable allies in our efforts to combat climate change and our foray into the Anthropocene Era because of their resilience to environmental changes. We have disturbed major Earth-system processes through our interference with the nitrogen cycle and climate change, disturbing the very environmental conditions that enabled our development. To salvage the damage, we would require the help of microbes. They can adapt to environmental changes quickly because their large numbers and rapid growth gives them the capacity create genetically diverse groups—granting them the ability to form new metabolic networks. The formation of these new networks can create a new steady state where excess nitrogen or carbon dioxide is removed from the system at the same rate it is added (8). Indeed, up until the Industrial Revolution, the evolution and basic composition of Earth’s atmosphere was tightly linked to the evolution of their metabolic networks (5). Cyanobacteria, which are oxygen producers as well as major nitrogen fixers, have had to evolve complex mechanisms to protect their oxygen sensitive nitrogenase. Taken together, microbes’ ability to resist environmental changes through evolutionary processes makes them indispensable allies in the fight against human-driven climate change.
In conclusion, microbes are necessary because of the metabolic networks they form. These networks facilitate biogeochemical process that are critical to our current lifestyle. Moreover, our attempts to mimic these processes have significantly damaged the environment and spurred climate change. The resilience of these metabolic networks to our activities will be instrumental as we enter the Anthropocene Era, but our perturbation of microbial-driven biogeochemical processes could lead to irreversible changes unless we practice restraint.
Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.
An example for Whitman and Wiebe (1998) has been included below.
Discuss the relationship between microbial community structure and metabolic diversity. Evaluate common methods for studying the diversity of microbial communities. Recognize basic design elements in metagenomic workflows.
Summarize the main results or findings.
Two fosmids were identified that contained the genes that are necessary and sufficient for proteorhodopsin based phototrophy. These were cloned into E. coli cells and both exterior pH and interior ATP concentration were shown to change when the e. Coli cells were exposed to light. Further, they showed that these fosmids contained genes sufficient to produce retinol (PR cofactor) as long as the cells already produced the intermediate FPP, which e. coli and many other bacteria do. Copy number of the genes showed a difference in phenotypic identification. The clones also had high similarity to other PR-containing BAC clones from Alphaproteobacteria from the Mediterranean and Red Seas.
Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)
https://www.nature.com/articles/nature12352
Madsen EL. 2005. Madsen microbes eco biogeochem process Nature Micro Opinion 3. PMID15864265
Martinez A, Bradley AS, Waldbauer JR, Summons RE, DeLong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci 104:5590–5595. PMID17372221
Taupp M, Mewis K, Hallam SJ. 2011. The art and design of functional metagenomic screens. Curr Opin Biotechnol 22:465–472. PMID21440432
Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS Comput Biol 6. PMID20195499
• Evaluate the concept of microbial species based on environmental surveys and cultivation studies.
• Explain the relationship between microdiversity, genomic diversity and metabolic potential
• Comment on the forces mediating divergence and cohesion in natural microbial communities
• Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution
• Identify common molecular signatures used to infer genomic identity and cohesion
• Differentiate between mobile elements and different modes of gene transfer
The difference and similarity of the genomes of CFT073, enterohemorrhagic EDL933, and a nonpathogenic laboratory strain MG1655. How do they compare to each other? What makes them distinct from one another?
Sequence analysis and annotation
islands tend to have adaptive traits
How to assess deletions that remove genes detrimental to uropathogenic lifestyle given the large number of genetic differences?
conclusions were sufficiently justified based on the evidence
I think pathogenic traits that are encoded in islands are transferred horizontal gene transfer, whereas the ancestral backbone genes (i.e. what groups them as a species) is vertically acquired.
In class Day 1:
Assignment:
In class Day 2:
Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.
Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.
Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.
For example, load in the packages you will use.
#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)
Then load in the data. You should use a similar format to record your community data.
seawater_data = read.csv("candy-data.csv",
header = TRUE)
seawater_data_small = read.csv("candy-data_small.csv",
header = TRUE)
Finally, use these data to create a table.
seawater_data %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| X | name | occurances |
|---|---|---|
| 1 | m&m green | 28 |
| 2 | m&m red | 28 |
| 3 | m&m blue | 60 |
| 4 | m&m yellow | 44 |
| 5 | m&m brown | 30 |
| 6 | m&m orange | 63 |
| 7 | skittle brown | 39 |
| 8 | skittle red | 33 |
| 9 | skittle green | 42 |
| 10 | skittle orange | 35 |
| 11 | skittle yellow | 23 |
| 12 | gummi bear red | 15 |
| 13 | gummi bear pink | 16 |
| 14 | gummi bear green | 18 |
| 15 | gummi bear orange | 15 |
| 16 | gummi bear yellow | 19 |
| 17 | gummi bear white | 16 |
| 18 | m&i pink | 39 |
| 19 | m & i green | 36 |
| 20 | m&i yellow | 27 |
| 21 | m&i orange | 32 |
| 22 | m&ired | 40 |
| 23 | worms red | 14 |
| 24 | balls yellow | 4 |
| 25 | balls green | 5 |
| 26 | balls purple | 3 |
| 27 | balls orange | 5 |
| 28 | balls red | 7 |
| 29 | chocolate kiss | 16 |
| 30 | lego pink | 7 |
| 31 | lego yellow | 5 |
| 32 | lego blue | 4 |
| NA | 768 |
seawater_data_small %>%
kable("html") %>%
kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
| X | name | characteristics | occurances |
|---|---|---|---|
| 1 | green ball | NA | 3 |
| 2 | gummi bear green | NA | 3 |
| 3 | gummi bear red | NA | 2 |
| 4 | gummi bear yellow | NA | 3 |
| 5 | gummi clear | NA | 1 |
| 6 | gummi orange | NA | 3 |
| 7 | m & i green | NA | 4 |
| 8 | m & i orange | NA | 1 |
| 9 | m & i pink | NA | 5 |
| 10 | m & i red | NA | 10 |
| 11 | m & i yellow | NA | 7 |
| 12 | m & m blue | NA | 12 |
| 13 | m & m brown | NA | 4 |
| 14 | m & m green | NA | 4 |
| 15 | m & m orange | NA | 11 |
| 16 | m & m yellow | NA | 5 |
| 17 | m & m red | NA | 2 |
| 18 | pink brick | NA | 1 |
| 19 | purple ball | NA | 2 |
| 20 | red lines | NA | 4 |
| 21 | skittle brown | NA | 2 |
| 22 | skittle green | NA | 10 |
| 23 | skittle orange | NA | 7 |
| 24 | skittle purple | NA | 5 |
| 25 | skittle red | NA | 4 |
| 26 | skittle yellow | NA | 6 |
| NA | NA | 121 |
For your community:
To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.
To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.
For example, we load in these data.
collection_curve = read.csv("collection_curve.csv",
header = FALSE)
And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.
ggplot(collection_curve, aes(x=V1, y=V2)) +
geom_point() +
geom_smooth() +
labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'
For your sample:
Using the table from Part 1, calculate species diversity using the following indices or metrics.
\(\frac{1}{D}\) where \(D = \sum p_i^2\)
\(p_i\) = the fractional abundance of the \(i^{th}\) species
For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =
species1 = 28/(768)
species2 = 28/(768)
species3 = 60/(768)
species4 = 44/(768)
species5 = 30/768
species6 = 63/768
species7 = 39/768
species8 = 33/768
species9 = 42/768
species10 = 35/768
species11 = 23/768
species12 = 15/768
species13 = 16/768
species14 = 18/768
species15 = 15/768
species16 = 19/768
species17 = 16/768
species18 = 39/768
species19 = 36/768
species20 = 27/768
species21 = 32/768
species22 = 40/768
species23 = 14/768
species24 = 4/768
species25 = 5/768
species26 = 3/768
species27 = 5/768
species28 = 7/768
species29 = 16/768
species30 = 7/768
species31 = 5/768
species32 = 4/768
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2 + species15^2 + species16^2 + species17^2 + species18^2 + species19^2 + species20^2 + species21^2 + species22^2 + species23^2 + species24^2 + species25^2 + species26^2 + species27^2 + species28^2 + species29^2 + species30^2 + species31^2 + species32^2)
## [1] 22.18718
Species1 = 3/121
Species2 = 3/121
Species3 = 2/121
Species4 = 3/121
Species5 = 1/121
Species6 = 4/121
Species7 = 1/121
Species8 = 5/121
Species9 = 10/121
Species10 = 7/121
Species11 = 12/121
Species12 = 4/121
Species13 = 4/121
Species14 = 11/121
Species15 = 5/121
Species16 = 2/121
Species17 = 1/121
Species18 = 2/121
Species19 = 4/121
Species20 = 2/121
Species21 = 10/121
Species22 = 7/121
Species23 = 5/121
Species24 = 4/121
Species25 = 6/121
Species26 = 3/121
1/ (Species1^2 + Species2^2 + Species3^2 + Species4^2 + Species5^2 + Species6^2 + Species7^2 + Species8^2 + Species9^2 + Species10^2 + Species11^2 + Species12^2 + Species13^2 + Species14^2 + Species15^2 + Species16^2 + Species17^2 + Species18^2 + Species19^2 + Species20^2 + Species21^2 + Species22^2 + Species23^2 + Species24^2 + Species25^2 + Species26^2 )
## [1] 18.09765
The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.
Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.
\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)
\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more
So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =
26 + 0/(26*2)
## [1] 26
32 + 0/(32*2)
## [1] 32
We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.
library(vegan)
First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).
data_diversity =
seawater_data %>%
select(name, occurances) %>%
spread(name, occurances)
data_diversity
## V1 balls green balls orange balls purple balls red balls yellow
## 1 768 5 5 3 7 4
## chocolate kiss gummi bear green gummi bear orange gummi bear pink
## 1 16 18 15 16
## gummi bear red gummi bear white gummi bear yellow lego blue lego pink
## 1 15 16 19 4 7
## lego yellow m & i green m&i orange m&i pink m&i yellow m&ired m&m blue
## 1 5 36 32 39 27 40 60
## m&m brown m&m green m&m orange m&m red m&m yellow skittle brown
## 1 30 28 63 28 44 39
## skittle green skittle orange skittle red skittle yellow worms red
## 1 42 35 33 23 14
small_data_diversity =
seawater_data_small %>%
select(name, occurances) %>%
spread(name, occurances)
small_data_diversity
## V1 green ball gummi bear green gummi bear red gummi bear yellow
## 1 121 3 3 2 3
## gummi clear gummi orange m & i green m & i orange m & i pink m & i red
## 1 1 3 4 1 5 10
## m & i yellow m & m blue m & m brown m & m green m & m orange m & m red
## 1 7 12 4 4 11 2
## m & m yellow pink brick purple ball red lines skittle brown
## 1 5 1 2 4 2
## skittle green skittle orange skittle purple skittle red skittle yellow
## 1 10 7 5 4 6
Then we can calculate the Simpson Reciprocal Index using the diversity function.
diversity(data_diversity, index="invsimpson")
## [1] 3.827491
diversity(small_data_diversity, index = "invsimpson")
## [1] 3.79055
And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.
specpool(data_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 33 33 0 33 0 33 33 0 1
specpool(small_data_diversity)
## Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All 27 27 0 27 0 27 27 0 1
In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.
For your sample:
If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.
How does the measure of diversity depend on the definition of species in your samples? The more specific the definition of speices for our samples, the more diverse our samples appear to be. Whereas if we had a broader definition of what constitutes as a speices (e.g.grouping candy only by colour), then our diversity would lower.
How might different sequencing technologies influence observed diversity in a sample? Some sequencing methods might overestimate the diversity and the number of species in a sample.
What defines a microbial species is still an on-going debate in the research community. The challenge in defining a microbial species can be attributed to several factors. Unlike eukaryotes, prokaryotes are usually haploid organisms that reproduce asexually. They also cannot be easily distinguished on phenotypic traits alone. To circumvent this, researchers have attempted to supplement phenotypic approaches with genotypic ones, where organisms are considered to be the same species if 70% of their DNA hybridizes (1). However, this definition is complicated by horizontal gene transfer (HGT) events, where the uptake of genetic material from the environment can cause different microbial species to have increased homology as well as functional traits.
Functional metabolic genes are more likely to undergo HGT, leading to an increased reliance on the highly conserved 16S rRNA gene as sequencing techniques and metagenomic approaches becomes more advanced. The gene is a slowly-evolving ‘phylogenetic’ anchor that is not only useful for species identification purposes and establishing evolutionary relationships (2). Two microbes are generally considered to be the same species if their 16S rRNA has a sequence similarity of 97% or higher (3), but this approach of species identification is not faultless. Indeed, overestimation of diversity from an environmental sample can happen as a result of poor quality filtering of 16s rRNA pyrosequencing data (4). It was also shown that two organisms with the same genus can have 99% 16s rRNA gene homology but still be two difference species (3). These issues indicate that a species classification approach that purely relies on 16S rRNA is potentially problematic and unreliable.
Perhaps our increased dependence on 16S rRNA in defining the concept of bacterial species is out of simplicity and convenience. Evolutionary pressures along with the transfer of entire metabolic pathways by HGT (5) permits the creation of microbial species and ecotypes, members of the same species that have evolved and adapted to a specific environment. The three pathogenic ecotypes of E.coli, GT073, EDL933, and MG1655 occupy different niches of the body (6). Intriguingly, despite sequence homology experiments indicating that they only share 39% of their genomic sequence (6), they would be classified as the same species based on their 16S rRNA sequences and genomic backbone. The discrepancies between their genome lies in the genes they have acquired through HGT that encode the pathological traits needed to occupy their specific niches. These strains of E. coli highlights how divergent events resulting from HGT relative to 16S rRNA marker genes could lead to an erosion of the microbial species definition.
While HGT blurs and complicates our attempts to define microbial species by distributing different metabolic pathways among members of the same species, the same mechanism has been instrumental in preserving the existence of certain metabolic pathways. Diversity of metabolic pathways is preserved over time as HGT distributes metabolic traits across different lineages and environments. One such example are the genes that encode the Nitrogenase enzyme, which are evolutionary favorable and detected in many lineages of microorganisms because it allows them to use inorganic nitrogen for anabolism (7). Functional gene sets such as the Nitrogenase genes are necessary for keeping the flow of nutrients on Earth flowing and by extension—the maintenance of biogeochemical cycles. Most of them likely originated from a large scale genetic innovation that occurred around 2.5 billion years ago during the Archean period (8). HGT played a key role in ensuring the survival of these functional genes after that event and for persevering it from being lost due to gene duplication and mass extinction events by distributing them across different ecological niches. Thus, HGT events directly influenced the state of biogeochemical cycles through the preservation of key metabolic pathways.
To summarize, there are two main approaches to defining microbial species, either through a pure genotypic approach or a functional approach. A genotypic approach, such as the extent to which the genomic DNA hybridizes together or how similar the 16S rRNA sequences are, is straightforward but grossly oversimplifies the bacterial species concept. Ecotypes are evident of this, in which organisms can be classified as the same species due to their 16S rRNA despite occupying different niches and having huge functional discrepancies due to HGT. On the other hand, we cannot define microbial species through their functional attributes alone either because HGT has also distributed metabolic pathways across microbial species of different lineages. Despite the flaws in these approaches, it is necessary to have a microbial species definition in some instances. This is especially apparent in a medical setting, where physicians would not be able to prescribe treatments or diagnose diseases caused by pathogenic microbes. For practicality’s sake, it is probably best to combine two approaches and adjust the definitions accordingly to the setting. It might be beneficial to have a “looser” definition in research—where bacterial species are grouped by genomic similarity as a starting point—and keep the definition fluid until we can come to a consensus.
Cho JC, Tiedje JM. 2001. Bacterial species determination from DNA-DNA hybridization by using genome fragments and DNA microarrays. Appl Environ Microbiol 67:3677–82.
Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J. 2008. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36:2230–2239.
Nguyen NP, Warnow T, Pop M, White B. 2016. A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity. npj Biofilms Microbiomes.
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123.
Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive earth’s biogeochemical cycles. Science (80- ) 320:1034–1039.
Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S-R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–4.
Falkowski PG. 1997. Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean. Nature 387:272–275.
David LA, Alm EJ. 2011. Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469:93–96.
htmltools::tags$iframe(title="MICB425 Project 1", src="Proj1.html", height=1000, width=1000)
Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11:2639–2643. PMID28731476
Gaudet AD, Ramer LM, Nakonechny J, Cragg JJ, Ramer MS. 2010. Small-group learning in an upper-level university biology class enhances academic performance and student attitudes toward group work. PLoS One 5. PMID21209910
Hallam SJ, Torres-Beltrán M, Hawley AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Sci Data 4:1–3. PMID29087370
Hawley AK, Brewer HM, Norbeck AD, Pa a-Toli L, Hallam SJ. 2014. Metaproteomics reveals differential modes of metabolic coupling among ubiquitous oxygen minimum zone microbes. Proc Natl Acad Sci 111:11395–11400. PMID25053816
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123. PMID19725865
Cordero OX, Ventouras L-A, DeLong EF, Polz MF. 2012. Public good dynamics drive evolution of iron acquisition strategies in natural bacterioplankton populations. Proc Natl Acad Sci 109:20059–20064. PMID23169633
Lundin D, Severin I, Logue JB, Östman Ö, Andersson AF, Lindström ES. 2012. Which sequencing depth is sufficient to describe patterns in bacterial alpha- and beta-diversity? Environ Microbiol Rep 4:367–372. PMID23760801
Morris JJ, Lenski RE, Zinser ER. 2012. The Black Queen Hypothesis: Evolution of Dependencies through Adaptative Gene Loss. MBio 3:1–7. PMID22448042
Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, Sarma-Rupavtarm R, Distel DL, Polz MF. 2005. Genotypic diversity within a natural coastal bacterioplankton population. Science (80- ) 307:1311–1313. PMID15731455
Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S-R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–4. PMID12471157
Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4:1–10. PMID29087371
Welch DBM, Huse SM. 2011. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere.” Handb Mol Microb Ecol II Metagenomics Differ Habitats 243–252. PMID16880384
htmltools::tags$iframe(title="MICB425 Project 2", src="Proj2.html", height=1000, width=1000)